Academic Open Internet Journal

ISSN 1311-4360

Volume 23, 2008

Measuring Quality Attributes of Web-based Applications

Part-II: Analysis and Models

Sanjeev Dhawan¹, Rakesh Kumar²

¹Faculty of Computer Science & Engineering,

University Institute of Engineering & Technology (U.I.E.T), Kurukshetra University, Kurukshetra (K.U.K)- 136 119, Haryana, India

E-mail: rsdhawan@rediffmail.com

²Faculty of Computer Science,

Department of Computer Science and Applications (D.C.S.A),

Kurukshetra University, Kurukshetra (K.U.K)- 136 119, Haryana, India

E-mail: rsagwal@rediffmail.com

Abstract: The both review and design of the effort assessment has been discussed in part-I of this paper. This paper (part-II) explores the variation of the effort estimation with the help of analysis and implementation of Web-based applications using the effort assessment of part-I of this paper. Here we promote a simple, but efficient approach to estimate the effort needed for designing Web-based applications with the help of RS Web Application Effort Assessment (RSWAEA) model as discussed in part-I of this paper. This proposed RSWAEA model has been designed after carrying out an empirical study with the students of an advanced university class and web designers that used various client-server based Web technologies. Our first aim was to compare the relative importance of each Web-based design model. Second, we also studied the quality of the designs obtained based on construction of a User Behavior Model Graph (UBMH) to capture the reliability of Web-based applications. The results obtained from the above assessments can help us to analytically identify the effort assessment and failure points in Web systems and makes the evaluation of reliability of these systems simple.

Key Words: RS Web Application Effort Assessment (RSWAEA) method, Web metrics, User Behavior Model Graph (UBMG).

1. Introduction

Reliable and precise effort assessment of high volume Web software is critical for project selection, planning and control. Over the past thirty years, various estimation models have been developed to help managers perform estimation tasks, and this has led to a market offering of estimation tools. For organizations interested in using such estimation tools, it should be crucial to know about the predictive performance of the estimates such tools produce. The construction of an estimation model usually requires a set of completed projects from which an arithmetic model is derived and which is used subsequently as the basis for the estimation of future projects. So, there is a need for an estimation model for the development effort of these Web projects. In this paper, it is tried to point out the need for predictive metrics to measure the development effort for Web-based applications. Finally the Web-based characteristics and parameters are used to predict the effort and duration in terms of the Web systems development. A new set of database has been made on the basis of a complete hypothetical study. The hypothetical study is conducted by providing the dataset of Web documents. An empirical research and study was carried out to provide effort assessment for small to large-size We-based applications. For this paper, we have analyzed many findings drawn from the experienced questionnaire. The results are augmented by answers of questionnaire from a survey of the students of an advanced university class and web designers. Our analyses suggest several areas (including reliability, usability, complexity, cost, time requirements and type of nature of Web design) where both Web-based designers, engineers and managers would benefit from better guidance about the proper implementation of Web-based applications.

2. Motivations of UBMG and RSWAEA

The techniques we propose have the following key objectives:

1. Derive the UBMG in a manner that capture complete details for valid sessions, and number of occurrences of invalid sessions. The valid sessions have metrics such as session count, reliability of session, probability of occurrence of the session, and transition probability of the pages in the session.

2. Derive the RSWAEA method to estimate the development effort of small to large-size projects, especially in scenarios that require fast estimation with little historical information. On the basis of RSWAEA method, the Web-based software effort estimations are examined with user’s cost, cost drivers, data Web objects compatibility, usability, maintainability, complexity, configuration, time requirements, and of interfaces, would also be examined and considered.

3. Qualities of good software metric

Lord Kelvin once said that when you can measure what you are speaking about and express it in numbers, you know something about it. Measurement is fundamental to any engineering discipline. The terms "measure", "measurement", and "metrics" are often used interchangeably, but according to Pressman [1]a measure provides a quantitative indication of the extent, amount, dimensions, capacity, or size of some attribute of a product or process. Measurement is the act of determining a measure. The IEEE Standard Glossary of Software Engineering Terms [2] defines metrics as "a quantitative measure of the degree to which a system, component, or process possesses a given attribute". Ejiogu [3] suggested that a metric should possess the following characteristics: (a) Simple and computable: It should be easy to learn how to derive the metric and its computation should not be effort and time consuming, (b) Empirically and intuitively persuasive: The metric should satisfy the engineer's intuitive notion about the product under consideration. The metric should behave in certain ways, rising falling appropriately under various and conditions, (c) Consistent and Objective: The metric should always yield results that are unambiguous. The third party would be able to derive the same metric value using the same information, (d) Consistent in its use of units and dimensions: It uses only those measures that do not lead to bizarre combinations of units, (e) Programming language independent, (f) An effective mechanism for quality feedback. In addition to the above-mentioned characteristics, Roche [4] suggests that metric should be defined in an unambiguous manner. According to Basil [5] Metrics should be tailored to best accommodate specific products and processes.

4. Implementation and Analysis of User Behavior Model Graph (UBMG)

UBMG can be represented in form of a graph or a matrix notation [6]. In the graph view, nodes represent the pages, and arcs represent the transition from one node to another. In the matrix representation each cell (i,j) corresponds to probability of transition from page i to page j. We extend UBMG by adding an additional node to the graphical view, and a column in case of the matrix view to represent errors encountered while traversing. The construction of UBMG starts with the navigational model and access logs as described in [7], where the navigational model represents the complete overview of the different pages and the flow between the pages in the Web system. The access logs store information regarding the timestamp, page accessed client-id, referrer-id, HTTP return code etc. for determining session information.

A sample format of IIS log file is shown in figure1.

#Fields: date time c-ip s-port cs-uri-stem cs-uri-query

sc-status time-taken cs (User-Agent) cs(Referrer)

Figure 1. format of IIS server log file

2007-01-19 00:00:00 203.124.225.19 a.asp -

2007-01-19 00:00:02 203.124.225.19 b.asp -

2007-01-19 00:00:03 203.124.225.19 c.asp d.asp

2007-01-19 00:00:05 203.124.225.19 e.asp f.asp

2007-01-19 00:00:06 203.124.225.19 c.asp b.asp

2007-01-19 00:00:07 203.124.225.19 f.asp a.asp

2007-01-19 00:00:10 203.124.225.19 d.asp e.asp

Figure 2. access log entries of IIS server

We consider referrer-id and the client-id fields as the basis to do a depth first search on the access logs. This approach will segregate valid and invalid sessions. To understand, consider an application with only two independent sessions- S1 with pages (a® b® c) and S2 with pages (d® e® f). Let the access log have entries as shown in figure 2. Given that the depth first search is based only with client-id field, we would have derived two valid sessions. However, with the referrer-id field we determine the invalid path consisting of pages (a ® b® f). The count of all such invalid sessions is determined, and the construction of UBMG is done only for the valid sessions. Let us consider the example of an Online Shipping System (OSS), where the two sessions defined in the navigational model are Session 1: “Export a package” with pages PackageSelection.asp® PackageDetails.asp ® Export.asp ® DeliveryLogistics.asp ® Payment.asp. Session 2: “Import a package” with pages PackageSelection.asp ® PackageDetails.asp ® Import.asp ® DeliveryLogistics.asp ® Payment.asp. We tag an alias for the pages as given in figure 3.

a-PackageSelection.asp; b-PackageDetails.asp; c- Export.asp; d- Import.asp;e-DeliveryLogistics.asp; f - Payment.asp.

Figure 3 shows the graphical view of UBMG with the exit node g. The matrix of transition probabilities for the above graph is shown in Table 1. The matrix considers only those sessions that have completed successfully. For example, sum of probabilities of the paths out of the node b is 0.9 indicating that 10% of clients had either dropped out or encountered errors.

Figure 3. graphical view of UBMG

Figure 4. addition of an error node to UBMG

The probability of reaching a node j in the graph can be calculated using Markov property [7, 8, 9]. The generalized notation is, Nj =N1 * P(1,j) + N2 * P(2,j) + …..+ Nk * P(k,j) Where, k is the number of nodes that lead to node j. In the OSS example, to compute the probability of reaching the node b is 0.4 * Na + 0.2 * Nb + 0.2 * Nc + 0.2 * Nd and probability of reaching the node e is 0.1 * Nc + 0.1 * Nd + 0.3 * Ne, where Na is always equal to one. The complexity of figures 3, 4, and 5 can be calculated using cyclomatic complexity (e-n+2 or e-n+1 or numbers of two edges in a single node+1, where: e is the total number of edges and n is the total number of nodes).

	a	b	c	d	e	f	g
a		0.4
b		0.2	0.3	0.5
c		0.2			0.1
d		0.2			0.1
e					0.3	0.6
f							0.7

Table 1. matrix of transition probabilities for OSS

4.1 Failure Analysis of UBMG

Now, we extend the UBMG to include the failure data. To capture the failure data, the access logs are scanned for HTTP return error codes of 4xx and 5xx as mentioned in [10]. Besides this, the errors from other servers are also considered. Theoretically, the error node can stem from any page in the graphical view. We add the error node Er and all the page errors are associated with this node. The matrix of transition probabilities will have an additional column to represent the error node. A cell (m, Er) of this column will include the probability of transitioning from the node m to error node Er.

Figure 5. addition of an error and the virtual nodes to UBMG

Considering the OSS example, the view of UBMG with the addition of error node is shown in figure 4. The matrix of transition probabilities for the figure 4 is shown in Table 2. The matrix considers only those sessions that have some error. Of all the requests that enter node c, 40% of them encountered some error. Before proceeding to failure analysis due to service-level agreements (SLA) violation, we define the term Session Response Time (SRT) which is the sum of the service times of all the pages in the session. We define the SLA at session level and hence we need the desired response time target for each session. The access log files can be used to determine the page service time (PST) values.

	a	b	c	d	e	f	g	Error
a		0.4
b		0.2	0.3	0.5
c		0.2			0.1			0.5
d		0.2			0.1
e					0.3	0.6		0.1
f							0.7

Table 2. matrix of transition probabilities with error node

For example, in the IIS Web server the time-taken field represents the time spend by server to respond to the request. SRT is computed as the sum of PST’ s of its individual pages. Further, we compute the number of successful sessions where the SLA was violated. Let S1 and S2 be two sessions for the OSS example. Table 3 shows the sessions information, where each session is represented by a unique column, and includes number of successful sessions, number of instances of SLA violation, etc. The probability of reaching exit node for a session is computed as the ratio of number of exits with respect to the number of visits at the entry page. Figure 5 shows the addition of virtual nodes to the existing figure 4. The matrix of transition probabilities for the figure 5 is shown in Table 4.

4.2 Calculation of Reliability

To compute the reliability of software code-level failures, we resort to determine the probability of encountering the failure node P_CODE-ERRORrepresented in Figure 3. To solve this probability of reaching the error node, we formulate a set of equations from the matrix and use techniques like Cramer’ s Rule, Matrix Inversion or Gauss Jordan Elimination method (for solving the sets of simultaneous equations). We also compute (a) the total number of failures due to invalid session N_{INVALID-SESSION}, and (b) number of instances where successful sessions did not meet SLA as N_SLA-FAIL. The probability of occurrence of invalid sessions is computed using (a). The probability of failure for a session due to (b) is computed by considering the total number of its successful sessions. In the OSS example, the probability of such failures is 0.59 in Session 1 and 0.56 in Session 2. The probability of a session reaching the exit node, but violating SLA or invalid sessions needs to be computed. The total session failure probability P_{SESSION-FAILURE} is calculated as the sum of all the individual session probabilities and the probability of occurrence of invalid sessions. The overall probability of failure P_{TOTAL-FAILURE} for the system is calculated as sum of the probability of reaching error node P_CODE-ERROR, and the probability of session failure P_{SESSION-FAILURE} for the entire system. The overall reliability R_SYSTEM of the system is calculated by the equation: R_SYSTEM = 1 – P_{TOTAL-FAILURE}. Thus the reliability computation is driven by failures at software code level, failures due to SLA violation and invalid sessions.

SESSIONS	S1	S2
1. Total no. of successful session	125	150
2. Total no. of SLA violation N_SLA-FAIL	64	67
3. Probability of failures due to (2)	0.59	0.56
4. Prob. of reaching exit node for each session	0.78	0.76
5. Probability of SLA violation for each session using (3) and (4)	0.37	0.32

Table 3. Results of SLA violation probability

	a	b	c	d	e	f	g	Error
a		0.4					0.7	0.6
b		0.2	0.3	0.5
c		0.2			0.1			0.5
d		0.2			0.1
e					0.3	0.6		0.1
f							0.7

Table 4. matrix of transition probabilities with error and virtual nodes

5. Strategic Implementation of RSWAEA method

In order to help the expert achieve more accurate effort estimations, RSWAEA method introduces a new sizing metric based on the data model of the Web-based information system to be developed: Data Web Objects (DWO). DWO is an indirect sizing metric that takes into account the characteristics of small to large-size projects. The idea behind the DWO is to identify the system functionality, analyzing its data model. DWOs are similar to other indirect metrics such as FPs [11], Object Points [12], or Web Objects [13] in the fact that they represent abstract concepts that are used to obtain the size of the system to be developed. Thus, we can fill the table 5 of DWOs to calculate the system size. The weight assigned to each category of DWO represents the development effort of each one, and it is based on the experience of the expert estimator. As we have already discussed in part-I of this paper, the effort estimation methods with the combination of WebMo and Web Objects are not appropriate to estimate the development effort of Web-based applications. Therefore, the RSWAEA method intends to be more appropriate to estimate the development effort of small to medium-size projects, especially in circumstances that require fast estimation with little historical information. Let us continue with the example of Visual Basic Script we used previously in UBMG for predicting a Web application. The 96 DWO count in Table 5 represents the size of the program that would be required for this Web application.

The last adjustable coefficient in RSWAEA corresponds to constant P that is the exponent value of the DWO*. This exponent is a value very close to 1.01, and it must neither be higher than 1.12 nor lower than 0.99. The cost of each user is the values between 0 and 5. A value of Cost of User (CU) of 0 means the system reuses all the functionality associated with each user type is zero. On the other hand, if the cost of user is five, this means that there is no reuse of any kind of functionality for each user type. The value of X* (coefficient of DWO representativeness) is between 1 to 1.3 depending upon small to large-size Web-based applications. The CU is a function of the user types to be supported by the system. The RSWAEA method considers three user types, defined as: Project manager, Web-designer and Counselor. The Project manager is in charge of supervising the available applications in the system, activating and deactivating functional areas of the system and maintaining the set of applications that keep the project in constant execution. The Web-designer uses the available functionality in the system to modify and consult to the stored information. The Counselor has access to part of the information available in the system, but only for reading. On the other hand, RSWAEA method also considers the possibility that variable user, which is a mix of the aforementioned as shown in table 6. Finally, the RSWAEA method uses a series of Cost Drivers taken from the WebMo model proposed by Reifer [13]. These Cost Drivers represent the available development scenarios for a particular project. Such scenarios have positive and negative influences over the development process that need to be taken into account during the estimation process. Cost Drivers are subjective factors in RSWAEA method, and the values of these cost drivers are depicted in table 7. Generally, Web objects consist of the Web documents (including empty and non-empty tags, image picture files, sound files and scripting type).

Type of DWO	Amount of Weight Factor DWO	Total of DWO
Regular Entities	5 ´ 8	40
Dependent Entities	1 ´ 10	10
Relationship Entities	3 ´ 3	9
Relationship 1 to 1	1 ´ 3	3
Relationship 1 to N	3 ´ 6	18
Number of multimedia files	2 ´ 6	12
Number of scripts	1 ´ 4	4
Total of DWO 96

Table 5. definition of DWOs amount

Fixed Users	User Type	Fraction of the Scope (I)	Reuse Degree (R)
	Project manager	0.4	0.1
	Web-designer	0.7	0.5
	Counselor	0.8	0.9
Variable Users	Secretary	0.2	1.0
Variable Users	Area Manager	0.3	0.9

Table 6. example of a user’s table for different user types in a Web-based application

For this model nine cost drivers are defined: PRCLX: Product reliability and complexity (Product attributes), PFDIF: Platform difficulty (Platform and net servers volatility, PECAP: Personnel capabilities (Knowledge skills and abilities of the work force), PEEXP: Experience of the personnel (Depth and width of the work force experience), FACIL: Facility and infrastructure (Tools, equipment and geographical distribution), SCHED: Scheduling (Risk degree assumption if delivery time is shortened.) CLIEN: Client type (Technology knowledge the client has; requirements stability), WTEAM: Work team. (Ability to work synergistically as a team), and PROEFF: Process efficiency (Development process efficiency). Each of these cost drivers is classified in a five level scale: very low, low, normal, high and very high (VL, L, N, H, VH). In order to determine which level corresponds to each cost driver, the estimator uses a series of predefined tables that were built using historical information. Each cost driver has an assigned value in each category, and the product of each value is part of the equation for calculating the effort in the RSWAEA method. The assigned values in each category are replaced in the RSWAEA effort estimation equation in order to obtain the result in man-hours.

Cost drivers for RSWAEA method
Driver	VL	L	N	H	VH
PRCLX	0.64	0.84	1.00	1.32	1.61
PFDIF	0.85	0.95	1.05	1.28	1.70
PECAP	1.52	1.28	1.02	0.92	0.85
PEEXP	1.30	1.14	1.02	0.90	0.85
FACIL	1.35	1.17	1.00	0.90	0.90
SCHED	1.40	1.18	1.00	0.95	0.95
CLIEN	1.45	1.25	1.04	0.88	0.80
WTEAM	1.45	1.25	1.00	0.88	0.85
PROEFF	1.30	1.15	1.05	0.90	0.70

Table 7. cost drivers of RSWAEA method and their values

6. Conclusions and Future Work

In this paper we have introduced an approach for determining the reliability and effort assessment/ estimation for Web-based systems. In order to get fast and reliable effort estimations of Web-based information systems development projects. These method, tested by offline and online analysis of Web logs, come up with useful metrics like RSWAEA, UBMG, session count, SRT computation etc., and these metrics can effectively be used for the computation of reliability and efforts for small to larger-size Web-base applications. Although these methods do not replace the expert estimator, but they provide him/her with a tool for achieving a more accurate estimation, based on real data in a shorter time. Estimating the cost, duration and reliability of Web developments has a number of challenges related to it. To handle these challenges, we have analyzed many findings drawn from the experienced and expert opinions. Finally, by taking the good qualities of a software metric and an accessible Web design, we validated that the proposed models have better effort predictive accuracy up to 76.5% and the overall reliability of the Web-based systems up to 72.5% than traditional methods. Our future work may include the study of lexical analysis together with COTS to develop the complete framework for effort assessment for authoring large volume Web-based applications.

7. Acknowledgements

A major part of the research reported in this paper was carried out at U.I.E.T, and D.C.S.A, K.U.K, Haryana, India. We are highly indebted and credited by gracious help from the Ernet section of K.U.K for their constant support and help while testing our proposed models on to different computer systems. The authors would like to thank those nameless individuals who worked hard to supply the data.

8. References

[1]. Pressman S. Roger, Software Engineering- A Practitioner's Approach (McGraw-Hill, 1997).

[2]. IEEE Trans. Software Engineering, Vol. SE-10, pp-728-738, (1984).

[3]. Ejiogu, L., Software Engineering with Formal Metrics (QED Publishing, 1991).

[4]. Roche, J.M., Software Metrics & Measurement Principles, Software Engineering Notes, ACM, Vol. 19, no. 1, pp.76-85, 1994.

[5]. Basili, V.R., & D.M.Weiss, A Methodology For Collecting Valid Software Engineering Data, IEEE Software Engineering Standards, Std. 610.12-1990, pp.47-48, 1993.

[6]. Daniel A. Menasce, Virgilio A.F. Almeida, Scaling for E-Business Technologies, Models, Performance, and Capacity Planning (Prentice Hall PTR, pp. 49-59, 2000).

[7]. Shubhashis Sengupta, Characterizing Web Workloads- a Transaction-Oriented View, IEEE/ IFIP 5th International Workshop on Distributed Computing (IWDC 2003), 2003.

[8]. Wen-Li Wang, Mei-Huei Tang, User-Oriented Reliability Modeling for a Web System, 14th International Symposium on Software Reliability Engineering (ISSRE), November 17 - 21, 2003.

[9]. D.A. Menace, V.A.F. Almeida, R. Fonseca, M.A. Mendes, A Methodology for Workload Characterization of E-Commerce sites, Proceedings of the 1st ACM conference on Electronic Commerce, 1999.

[10]. [The Internet Society, Request for Comments (RFC): 2616. Hypertext Transfer Protocol–HTTP/1.1, http://www.w3.org/Protocols/rfc2616/rfc2616.html.

[11]. International Function Point Users Group, Function Point Counting Practices Manual, http://www.ifpug.org/publications/manual.htm,

[12]. B. Boehm, Anchoring the Software Process, IEEE Software, Vol. 13, No. 4, pages 73 -82, July 1996.

[13]. D.J. Reifer, Web Development: Estimating Quick–to-Market Software, IEEE Software, Vol. 17, No. 6, pages 57 - 64, November-December 2000.

About the authors:

Sanjeev Dhawan is Lecturer in Computer Science & Engineering at the Kurukshetra University, Kurukshetra, Haryana. He has done his postgraduates degrees in Master of Science (M.Sc.) in Electronics Master of Technology (M.Tech.) in Computer Science & Engineering, and Master of Computer Applications (M.C.A) from the Kurukshetra University. At present he is pursuing PhD in Computer Science from Kurukshetra University. His current research interests include web engineering, advanced computer architectures, Intel microprocessors, programming languages and bio-molecular level computing.

Rakesh Kumar received his PhD in Computer Science and M.C.A from Kurukshetra University, Kurukshetra, Haryana. He is currently Senior Lecturer at the Department of Computer Science & Application, Kurukshetra University. His current research focuses on programming languages, information retrieval systems, software engineering, artificial intelligence, and compilers design.

Technical College - Bourgas,

Sanjeev Dhawan1, Rakesh Kumar2

Key Words: RS Web Application Effort Assessment (RSWAEA) method, Web metrics, User Behavior Model Graph (UBMG).

2. Motivations of UBMG and RSWAEA

a

b

c

d

e

f

g

a

0.4

b

0.2

0.3

0.5

c

0.2

0.1

d

0.2

0.1

e

0.3

0.6

f

0.7

a

b

c

d

e

f

g

Error

a

0.4

b

0.2

0.3

0.5

c

0.2

0.1

0.5

d

0.2

0.1

e

0.3

0.6

0.1

f

0.7

SESSIONS

a

b

c

d

e

f

g

Error

a

0.4

0.7

0.6

b

0.2

0.3

0.5

Sanjeev Dhawan¹, Rakesh Kumar²